Multilingual Ontologies for Cross-Language Information Extraction and Semantic Search
نویسندگان
چکیده
Valuable local information is often available on the web, but encoded in a foreign language that non-local users do not understand. Can we create a system to allow a user to query in language L1 for facts in a web page written in language L2? We propose a suite of multilingual extraction ontologies as a solution to this problem. We ground extraction ontologies in each language of interest, and we map both the data and the metadata among the language-specific extraction ontologies. The mappings are through a central, language-agnostic ontology that allows new languages to be added by only having to provide one mapping rather than one for each language pair. Results from an implemented early prototype demonstrate the feasibility of cross-language information extraction and semantic search. Further, results from an experimental evaluation of ontology-based query translation and extraction accuracy are remarkably good given the complexity of the problem and the complications of its implementation.
منابع مشابه
Multilingual Extraction Ontologies
The growth of multilingual web content and increasing internationalization portends the need for cross-language query processing. We offer ML-OntoES (a MultiLingual Ontology-based Extraction System) as a solution for narrowdomain/data-rich applications. Based on language-independent extraction ontologies (Embley, Liddle, & Lonsdale, 2011), ML-OntoES enables semantic search over domain-specific,...
متن کاملCross-Language Hybrid Keyword and Semantic Search
The growth of multilingual web content and increasing internationalization portends the need for cross-language information retrieval. As a solution to this problem for narrow-domain, data-rich web content, we offer ML-HyKSS: MultiLingual Hybrid Keyword and Semantic Search. The key component of ML-HyKSS is a collection of linguistically grounded conceptual-model instances called extraction onto...
متن کاملPresenting a method for extracting structured domain-dependent information from Farsi Web pages
Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...
متن کاملCross-Lingual Information Retrieval and Semantic Interoperability for Cultural Heritage Repositories
This paper describes a computational linguistics-based approach for providing interoperability between multi-lingual systems in order to overcome crucial issues like cross-language and cross-collection retrieval. Our proposal is a system which improves capabilities of language-technology-based information extraction. In the last few years various theories have been developed and applied for mak...
متن کاملMulti-word processing in an ontology-based Cross-Language Information Retrieval model for specific domain collections
This paper proposes a methodological approach to CLIR applications for the development of a system which improves multi-word processing when specific domain translation is required. The system is based on a multilingual ontology, which can improve both translation and retrieval accuracy and effectiveness. The proposed framework allows mapping data and metadata among language-specific ontologies...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011